A Common-Factor Approach for Multivariate Data Cleaning with an Application to Mars Phoenix Mission Data
نویسندگان
چکیده
Data quality is fundamentally important to ensure the reliability of data for stakeholders to make decisions. In real world applications, such as scientific exploration of extreme environments, it is unrealistic to require raw data collected to be perfect. As data miners, when it is infeasible to physically know the why and the how in order to clean up the data, we propose to seek the intrinsic structure of the signal to identify the common factors of multivariate data. Using our new data-driven learning method—the common-factor data cleaning approach, we address an interdisciplinary challenge on multivariate data cleaning when complex external impacts appear to interfere with multiple data measurements. Existing data analyses typically process one signal measurement at a time without considering the associations among all signals. We analyze all signal measurements simultaneously to find the hidden common factors that drive all measurements to vary together, but not as a result of the true data measurements. We use common factors to reduce the variations in the data without changing the base mean level of the data to avoid altering the physical meaning. We have reanalyzed the NASA Mars Phoenix mission data used in the leading effort by Kounaves’s team (lead scientist for the wet chemistry experiment on the Phoenix) [1, 2] with our proposed method to show the resulting differences. We demonstrate that this new common-factor method successfully helps reducing systematic noises without definitive understanding of the source and without degrading the physical meaning of the signal.
منابع مشابه
Relevance vector machine and multivariate adaptive regression spline for modelling ultimate capacity of pile foundation
This study examines the capability of the Relevance Vector Machine (RVM) and Multivariate Adaptive Regression Spline (MARS) for prediction of ultimate capacity of driven piles and drilled shafts. RVM is a sparse method for training generalized linear models, while MARS technique is basically an adaptive piece-wise regression approach. In this paper, pile capacity prediction models are developed...
متن کاملSeismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task
In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...
متن کاملThermal and Evolved Gas Analysis of Geologic Samples Containing Organic Materials: Implications for the 2007 Mars Phoenix Scout Mission
Introduction: The Thermal “and” Evolved Gas Analyzer (TEGA) instrument scheduled to fly onboard the 2007 Mars Phoenix Scout Mission will perform differential scanning calorimetry (DSC) and evolved gas analysis (EGA) of soil samples and ice collected from the surface and subsurface at a northern landing site on Mars. We have been developing a sample characterization data library using a laborato...
متن کاملReproducing Meteorological Observations at the Mars Phoenix Lander Site Using the Nasa Ames
The mission began in late Spring (L s ~77˚) and ended in midSummer (L s ~148˚), lasting for 151 sols [2]. In situ measurements by the lander characterized the local atmospheric conditions (i.e. temperature, pressure, wind speeds and direction, opacity of dust and water ice, and the detection of surface water ice frost) [3,4]. Phoenix measured a steady decline in near surface atmospheric pressur...
متن کاملESTIMATING DRYING SHRINKAGE OF CONCRETE USING A MULTIVARIATE ADAPTIVE REGRESSION SPLINES APPROACH
In the present study, the multivariate adaptive regression splines (MARS) technique is employed to estimate the drying shrinkage of concrete. To this purpose, a very big database (RILEM Data Bank) from different experimental studies is used. Several effective parameters such as the age of onset of shrinkage measurement, age at start of drying, the ratio of the volume of the sample on its drying...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1510.01291 شماره
صفحات -
تاریخ انتشار 2015